-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29239][SPARK-29221][SQL] Subquery should not cause NPE when eliminating subexpression #25925
Conversation
expr.find(_.isInstanceOf[LambdaVariable]).isDefined || | ||
// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor, | ||
// can cause unexpected error. | ||
expr.isInstanceOf[PlanExpression[_]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to only skip it on executors? This may still be useful when we codegen at driver side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like check TaskContext.get != null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah that's a good idea!
expr.find(_.isInstanceOf[LambdaVariable]).isDefined | ||
expr.find(_.isInstanceOf[LambdaVariable]).isDefined || | ||
// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor, | ||
// can cause unexpected error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there isn't any other reason, shall we mention NPE specifically instead of unexpected error
? Both SPARK-29239 and SPARK-29221 are due to NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I updated this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding including NPE
after unexpected error
? IMHO unexpected error
is actually correct (we can't predict which error we will get), but it would help much if we enumerate known errors as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I realized my browser didn't show the comment from @viirya . It's just a 2 cents and like NPE
seems OK to me.
Test build #111333 has finished for PR 25925 at commit
|
Test build #111359 has finished for PR 25925 at commit
|
expr.find(_.isInstanceOf[LambdaVariable]).isDefined || | ||
// `PlanExpression` wraps query plan. To compare query plans of `PlanExpression` on executor, | ||
// can cause error like NPE. | ||
(expr.isInstanceOf[PlanExpression[_]] && TaskContext.get != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for curiosity, does this issue happen in interpreted code path as well? e.g. we send PlanExpression
to executor side and eval it, and hit NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC,EquivalentExpressions
is only used in the codegen mode now, e.g., GenerateUnsafeProjection
uses this class in common subexpr elimination, but `InterpretedUnsafeProject does not elimnate common subexprs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand your question correctly. But PlanExpressions of a SparkPlan are evaluated and updated (e.g., ExecSubqueryExpression.updateResult) with values before a query begins to run. The values are kept in PlanExpression, and on executor side when to call eval of PlanExpression, it simply returns the kept value. I think we do not really evaluate a PlanExpression at executor side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got it, so the kept value is serialized and sent to executor side in interpreted code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue also reminds me that it's better to always do codegen at driver side, even if whole-stage-codegen is false. We can investigate it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Please let me know if you have some ideas later.
thanks, merging to master! @viirya can you send another PR to 2.4? I tried to backport but it has conflicts. |
@cloud-fan Ok. Let me create a backport PR. |
@cloud-fan I just tried to make a backport. In branch-2.4, Project uses an UnsafeProjection API which is not covered by CODEGEN_FACTORY_MODE config yet. So can not have an end-to-end test like this. WDYT? Should we make a backport with unit test against EquivalentExpressions, or skip backport? |
do we have to trigger the bug via |
It captures NonFatal so actually it can fallback when NPE. |
It captures NonFatal when compiling the code, not running the code, right? |
Yes, it captures NonFatal when compiling the code. NPE happens when generating the code to be compiled on executor. |
ah I see. Then we can skip backporting as users can still run the query |
What changes were proposed in this pull request?
This patch proposes to skip PlanExpression when doing subexpression elimination on executors.
Why are the changes needed?
Subexpression elimination can possibly cause NPE when applying on execution subquery expression like ScalarSubquery on executors. It is because PlanExpression wraps query plan. To compare query plan on executor when eliminating subexpression, can cause unexpected error, like NPE when accessing transient fields.
The NPE looks like:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit test.